Overview
Brought to you by YData
Dataset statistics
| Number of variables | 20 |
|---|---|
| Number of observations | 999 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 720.8 KiB |
| Average record size in memory | 738.9 B |
Variable types
| Text | 8 |
|---|---|
| Numeric | 6 |
| Categorical | 2 |
| Boolean | 4 |
has_missing_fields has constant value "False" | Constant |
has_transcript_issues has constant value "False" | Constant |
has_summary_issues has constant value "False" | Constant |
compression_ratio is highly overall correlated with summary_length and 1 other fields | High correlation |
summary_length is highly overall correlated with compression_ratio and 1 other fields | High correlation |
transcript_length is highly overall correlated with word_count_transcript | High correlation |
word_count_summary is highly overall correlated with compression_ratio and 1 other fields | High correlation |
word_count_transcript is highly overall correlated with transcript_length | High correlation |
priority is highly imbalanced (75.6%) | Imbalance |
line_number is uniformly distributed | Uniform |
record_id has unique values | Unique |
line_number has unique values | Unique |
Reproduction
| Analysis started | 2025-10-01 10:16:24.280579 |
|---|---|
| Analysis finished | 2025-10-01 10:16:25.727778 |
| Duration | 1.45 second |
| Software version | ydata-profiling vv4.17.0 |
| Download configuration | config.json |
Variables
record_id
Text
Unique
| Distinct | 999 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 55.6 KiB |
Length
| Max length | 9 |
|---|---|
| Median length | 8 |
| Mean length | 7.8928929 |
| Min length | 6 |
Unique
| Unique | 999 ? |
|---|---|
| Unique (%) | 100.0% |
Sample
| 1st row | line_1 |
|---|---|
| 2nd row | line_2 |
| 3rd row | line_3 |
| 4th row | line_4 |
| 5th row | line_5 |
| Value | Count | Frequency (%) |
| line_8 | 1 | 0.1% |
| line_1000 | 1 | 0.1% |
| line_1 | 1 | 0.1% |
| line_2 | 1 | 0.1% |
| line_3 | 1 | 0.1% |
| line_4 | 1 | 0.1% |
| line_5 | 1 | 0.1% |
| line_985 | 1 | 0.1% |
| line_986 | 1 | 0.1% |
| line_987 | 1 | 0.1% |
| Other values (989) | 989 |
Most occurring characters
| Value | Count | Frequency (%) |
| l | 999 | |
| i | 999 | |
| n | 999 | |
| e | 999 | |
| _ | 999 | |
| 1 | 300 | 3.8% |
| 3 | 300 | 3.8% |
| 7 | 300 | 3.8% |
| 4 | 300 | 3.8% |
| 5 | 300 | 3.8% |
| Other values (5) | 1390 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 7885 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| l | 999 | |
| i | 999 | |
| n | 999 | |
| e | 999 | |
| _ | 999 | |
| 1 | 300 | 3.8% |
| 3 | 300 | 3.8% |
| 7 | 300 | 3.8% |
| 4 | 300 | 3.8% |
| 5 | 300 | 3.8% |
| Other values (5) | 1390 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 7885 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| l | 999 | |
| i | 999 | |
| n | 999 | |
| e | 999 | |
| _ | 999 | |
| 1 | 300 | 3.8% |
| 3 | 300 | 3.8% |
| 7 | 300 | 3.8% |
| 4 | 300 | 3.8% |
| 5 | 300 | 3.8% |
| Other values (5) | 1390 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 7885 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| l | 999 | |
| i | 999 | |
| n | 999 | |
| e | 999 | |
| _ | 999 | |
| 1 | 300 | 3.8% |
| 3 | 300 | 3.8% |
| 7 | 300 | 3.8% |
| 4 | 300 | 3.8% |
| 5 | 300 | 3.8% |
| Other values (5) | 1390 |
line_number
Real number (ℝ)
Uniform Unique
| Distinct | 999 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 500.78178 |
| Minimum | 1 |
|---|---|
| Maximum | 1000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 50.9 |
| Q1 | 251.5 |
| median | 501 |
| Q3 | 750.5 |
| 95-th percentile | 950.1 |
| Maximum | 1000 |
| Range | 999 |
| Interquartile range (IQR) | 499 |
Descriptive statistics
| Standard deviation | 288.82654 |
|---|---|
| Coefficient of variation (CV) | 0.57675129 |
| Kurtosis | -1.1992711 |
| Mean | 500.78178 |
| Median Absolute Deviation (MAD) | 250 |
| Skewness | -0.0020031687 |
| Sum | 500281 |
| Variance | 83420.77 |
| Monotonicity | Strictly increasing |
| Value | Count | Frequency (%) |
| 1000 | 1 | 0.1% |
| 1 | 1 | 0.1% |
| 2 | 1 | 0.1% |
| 3 | 1 | 0.1% |
| 4 | 1 | 0.1% |
| 5 | 1 | 0.1% |
| 6 | 1 | 0.1% |
| 984 | 1 | 0.1% |
| 983 | 1 | 0.1% |
| 982 | 1 | 0.1% |
| Other values (989) | 989 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 10 | 1 |
| Value | Count | Frequency (%) |
| 1000 | 1 | |
| 999 | 1 | |
| 998 | 1 | |
| 997 | 1 | |
| 996 | 1 | |
| 995 | 1 | |
| 994 | 1 | |
| 993 | 1 | |
| 992 | 1 | |
| 991 | 1 |
transcript_length
Real number (ℝ)
High correlation
| Distinct | 243 |
|---|---|
| Distinct (%) | 24.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 631.57357 |
| Minimum | 462 |
|---|---|
| Maximum | 953 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | 462 |
|---|---|
| 5-th percentile | 544 |
| Q1 | 591 |
| median | 627 |
| Q3 | 664.5 |
| 95-th percentile | 730 |
| Maximum | 953 |
| Range | 491 |
| Interquartile range (IQR) | 73.5 |
Descriptive statistics
| Standard deviation | 59.036998 |
|---|---|
| Coefficient of variation (CV) | 0.093476041 |
| Kurtosis | 2.2069959 |
| Mean | 631.57357 |
| Median Absolute Deviation (MAD) | 37 |
| Skewness | 0.8240949 |
| Sum | 630942 |
| Variance | 3485.3671 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 623 | 14 | 1.4% |
| 663 | 11 | 1.1% |
| 577 | 11 | 1.1% |
| 610 | 11 | 1.1% |
| 585 | 11 | 1.1% |
| 613 | 10 | 1.0% |
| 631 | 10 | 1.0% |
| 639 | 10 | 1.0% |
| 604 | 10 | 1.0% |
| 644 | 10 | 1.0% |
| Other values (233) | 891 |
| Value | Count | Frequency (%) |
| 462 | 1 | |
| 484 | 1 | |
| 487 | 1 | |
| 490 | 1 | |
| 507 | 1 | |
| 508 | 1 | |
| 513 | 1 | |
| 517 | 1 | |
| 518 | 1 | |
| 523 | 1 |
| Value | Count | Frequency (%) |
| 953 | 1 | 0.1% |
| 938 | 1 | 0.1% |
| 919 | 1 | 0.1% |
| 845 | 1 | 0.1% |
| 837 | 1 | 0.1% |
| 825 | 2 | |
| 815 | 3 | |
| 814 | 1 | 0.1% |
| 811 | 1 | 0.1% |
| 804 | 1 | 0.1% |
summary_length
Real number (ℝ)
High correlation
| Distinct | 234 |
|---|---|
| Distinct (%) | 23.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 251.2953 |
| Minimum | 78 |
|---|---|
| Maximum | 478 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | 78 |
|---|---|
| 5-th percentile | 156 |
| Q1 | 218 |
| median | 255 |
| Q3 | 285 |
| 95-th percentile | 338.1 |
| Maximum | 478 |
| Range | 400 |
| Interquartile range (IQR) | 67 |
Descriptive statistics
| Standard deviation | 53.628761 |
|---|---|
| Coefficient of variation (CV) | 0.21340933 |
| Kurtosis | 0.41930783 |
| Mean | 251.2953 |
| Median Absolute Deviation (MAD) | 32 |
| Skewness | -0.082231057 |
| Sum | 251044 |
| Variance | 2876.044 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 272 | 14 | 1.4% |
| 282 | 13 | 1.3% |
| 251 | 13 | 1.3% |
| 264 | 13 | 1.3% |
| 239 | 12 | 1.2% |
| 287 | 12 | 1.2% |
| 261 | 11 | 1.1% |
| 270 | 11 | 1.1% |
| 257 | 11 | 1.1% |
| 254 | 11 | 1.1% |
| Other values (224) | 878 |
| Value | Count | Frequency (%) |
| 78 | 1 | |
| 82 | 2 | |
| 97 | 1 | |
| 102 | 1 | |
| 110 | 1 | |
| 118 | 1 | |
| 122 | 1 | |
| 123 | 1 | |
| 128 | 2 | |
| 130 | 1 |
| Value | Count | Frequency (%) |
| 478 | 1 | |
| 447 | 1 | |
| 415 | 1 | |
| 408 | 1 | |
| 396 | 1 | |
| 391 | 1 | |
| 384 | 1 | |
| 376 | 1 | |
| 373 | 2 | |
| 372 | 1 |
compression_ratio
Real number (ℝ)
High correlation
| Distinct | 975 |
|---|---|
| Distinct (%) | 97.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.39999268 |
| Minimum | 0.12206573 |
|---|---|
| Maximum | 0.80067002 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | 0.12206573 |
|---|---|
| 5-th percentile | 0.25258602 |
| Q1 | 0.34228299 |
| median | 0.40536278 |
| Q3 | 0.45650519 |
| 95-th percentile | 0.53403136 |
| Maximum | 0.80067002 |
| Range | 0.67860429 |
| Interquartile range (IQR) | 0.1142222 |
Descriptive statistics
| Standard deviation | 0.087763369 |
|---|---|
| Coefficient of variation (CV) | 0.21941244 |
| Kurtosis | 0.45074473 |
| Mean | 0.39999268 |
| Median Absolute Deviation (MAD) | 0.056856375 |
| Skewness | 0.028797983 |
| Sum | 399.59268 |
| Variance | 0.0077024089 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0.3333333333 | 4 | 0.4% |
| 0.3983606557 | 3 | 0.3% |
| 0.375 | 3 | 0.3% |
| 0.3563402889 | 2 | 0.2% |
| 0.2829912023 | 2 | 0.2% |
| 0.2942097027 | 2 | 0.2% |
| 0.3987034036 | 2 | 0.2% |
| 0.2750424448 | 2 | 0.2% |
| 0.4615384615 | 2 | 0.2% |
| 0.4534050179 | 2 | 0.2% |
| Other values (965) | 975 |
| Value | Count | Frequency (%) |
| 0.1220657277 | 1 | |
| 0.1291338583 | 1 | |
| 0.1350906096 | 1 | |
| 0.1556982343 | 1 | |
| 0.1634615385 | 1 | |
| 0.1777059774 | 1 | |
| 0.1853785901 | 1 | |
| 0.1909620991 | 1 | |
| 0.1979320532 | 1 | |
| 0.2002820874 | 1 |
| Value | Count | Frequency (%) |
| 0.8006700168 | 1 | |
| 0.7267657993 | 1 | |
| 0.7017268446 | 1 | |
| 0.6576862124 | 1 | |
| 0.6570048309 | 1 | |
| 0.6269430052 | 1 | |
| 0.6205673759 | 1 | |
| 0.6119133574 | 1 | |
| 0.6097560976 | 1 | |
| 0.6053962901 | 1 |
word_count_transcript
Real number (ℝ)
High correlation
| Distinct | 64 |
|---|---|
| Distinct (%) | 6.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 112.54655 |
| Minimum | 83 |
|---|---|
| Maximum | 166 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | 83 |
|---|---|
| 5-th percentile | 97 |
| Q1 | 105 |
| median | 112 |
| Q3 | 118 |
| 95-th percentile | 130 |
| Maximum | 166 |
| Range | 83 |
| Interquartile range (IQR) | 13 |
Descriptive statistics
| Standard deviation | 10.482622 |
|---|---|
| Coefficient of variation (CV) | 0.093140322 |
| Kurtosis | 1.9626728 |
| Mean | 112.54655 |
| Median Absolute Deviation (MAD) | 7 |
| Skewness | 0.73697253 |
| Sum | 112434 |
| Variance | 109.88536 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 113 | 45 | 4.5% |
| 115 | 45 | 4.5% |
| 114 | 42 | 4.2% |
| 111 | 42 | 4.2% |
| 106 | 40 | 4.0% |
| 118 | 39 | 3.9% |
| 108 | 38 | 3.8% |
| 112 | 37 | 3.7% |
| 116 | 37 | 3.7% |
| 107 | 36 | 3.6% |
| Other values (54) | 598 |
| Value | Count | Frequency (%) |
| 83 | 2 | 0.2% |
| 85 | 1 | 0.1% |
| 88 | 1 | 0.1% |
| 89 | 1 | 0.1% |
| 90 | 2 | 0.2% |
| 91 | 4 | |
| 92 | 1 | 0.1% |
| 93 | 7 | |
| 94 | 2 | 0.2% |
| 95 | 8 |
| Value | Count | Frequency (%) |
| 166 | 1 | |
| 162 | 2 | |
| 155 | 1 | |
| 150 | 1 | |
| 149 | 1 | |
| 148 | 2 | |
| 146 | 2 | |
| 145 | 1 | |
| 144 | 1 | |
| 142 | 1 |
word_count_summary
Real number (ℝ)
High correlation
| Distinct | 56 |
|---|---|
| Distinct (%) | 5.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 40.431431 |
| Minimum | 13 |
|---|---|
| Maximum | 75 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | 13 |
|---|---|
| 5-th percentile | 25 |
| Q1 | 35 |
| median | 41 |
| Q3 | 46 |
| 95-th percentile | 56 |
| Maximum | 75 |
| Range | 62 |
| Interquartile range (IQR) | 11 |
Descriptive statistics
| Standard deviation | 9.1595222 |
|---|---|
| Coefficient of variation (CV) | 0.22654459 |
| Kurtosis | 0.19744495 |
| Mean | 40.431431 |
| Median Absolute Deviation (MAD) | 6 |
| Skewness | -0.028034678 |
| Sum | 40391 |
| Variance | 83.896847 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 41 | 56 | 5.6% |
| 43 | 51 | 5.1% |
| 44 | 50 | 5.0% |
| 40 | 50 | 5.0% |
| 39 | 45 | 4.5% |
| 42 | 44 | 4.4% |
| 45 | 43 | 4.3% |
| 38 | 38 | 3.8% |
| 37 | 38 | 3.8% |
| 46 | 37 | 3.7% |
| Other values (46) | 547 |
| Value | Count | Frequency (%) |
| 13 | 2 | 0.2% |
| 14 | 1 | 0.1% |
| 16 | 1 | 0.1% |
| 17 | 2 | 0.2% |
| 18 | 3 | 0.3% |
| 19 | 1 | 0.1% |
| 20 | 2 | 0.2% |
| 21 | 5 | |
| 22 | 10 | |
| 23 | 9 |
| Value | Count | Frequency (%) |
| 75 | 1 | 0.1% |
| 71 | 1 | 0.1% |
| 70 | 1 | 0.1% |
| 68 | 1 | 0.1% |
| 67 | 1 | 0.1% |
| 64 | 1 | 0.1% |
| 63 | 1 | 0.1% |
| 62 | 2 | 0.2% |
| 61 | 3 | |
| 60 | 7 |
name
Text
| Distinct | 65 |
|---|---|
| Distinct (%) | 6.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 52.8 KiB |
Length
| Max length | 19 |
|---|---|
| Median length | 18 |
| Mean length | 4.988989 |
| Min length | 3 |
Unique
| Unique | 30 ? |
|---|---|
| Unique (%) | 3.0% |
Sample
| 1st row | Ahmed |
|---|---|
| 2nd row | Daniel |
| 3rd row | Daniel |
| 4th row | Jane |
| 5th row | John |
| Value | Count | Frequency (%) |
| john | 178 | |
| james | 136 | |
| joseph | 109 | |
| sam | 80 | 7.9% |
| simon | 68 | 6.7% |
| samuel | 59 | 5.9% |
| david | 57 | 5.7% |
| jacob | 30 | 3.0% |
| peter | 26 | 2.6% |
| grace | 22 | 2.2% |
| Other values (54) | 243 |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 588 | |
| J | 506 | |
| o | 495 | |
| e | 477 | |
| m | 460 | |
| n | 361 | 7.2% |
| s | 347 | 7.0% |
| h | 338 | 6.8% |
| S | 229 | 4.6% |
| i | 177 | 3.6% |
| Other values (31) | 1006 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 4984 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| a | 588 | |
| J | 506 | |
| o | 495 | |
| e | 477 | |
| m | 460 | |
| n | 361 | 7.2% |
| s | 347 | 7.0% |
| h | 338 | 6.8% |
| S | 229 | 4.6% |
| i | 177 | 3.6% |
| Other values (31) | 1006 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 4984 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| a | 588 | |
| J | 506 | |
| o | 495 | |
| e | 477 | |
| m | 460 | |
| n | 361 | 7.2% |
| s | 347 | 7.0% |
| h | 338 | 6.8% |
| S | 229 | 4.6% |
| i | 177 | 3.6% |
| Other values (31) | 1006 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 4984 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| a | 588 | |
| J | 506 | |
| o | 495 | |
| e | 477 | |
| m | 460 | |
| n | 361 | 7.2% |
| s | 347 | 7.0% |
| h | 338 | 6.8% |
| S | 229 | 4.6% |
| i | 177 | 3.6% |
| Other values (31) | 1006 |
location
Text
| Distinct | 81 |
|---|---|
| Distinct (%) | 8.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 55.5 KiB |
Length
| Max length | 36 |
|---|---|
| Median length | 22 |
| Mean length | 7.7897898 |
| Min length | 3 |
Unique
| Unique | 42 ? |
|---|---|
| Unique (%) | 4.2% |
Sample
| 1st row | Mombasa |
|---|---|
| 2nd row | Kisumu |
| 3rd row | Mombasa |
| 4th row | Kisumu |
| 5th row | Narok |
| Value | Count | Frequency (%) |
| mombasa | 293 | |
| mwanza | 266 | |
| kisumu | 165 | |
| kenya | 88 | 7.4% |
| tanzania | 45 | 3.8% |
| eldoret | 38 | 3.2% |
| kampala | 34 | 2.9% |
| nairobi | 26 | 2.2% |
| mbeya | 25 | 2.1% |
| tororo | 20 | 1.7% |
| Other values (60) | 188 |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 1679 | |
| M | 629 | 8.1% |
| s | 508 | 6.5% |
| m | 506 | 6.5% |
| n | 477 | 6.1% |
| o | 463 | 5.9% |
| u | 429 | 5.5% |
| b | 372 | 4.8% |
| i | 361 | 4.6% |
| z | 315 | 4.0% |
| Other values (32) | 2043 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 7782 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| a | 1679 | |
| M | 629 | 8.1% |
| s | 508 | 6.5% |
| m | 506 | 6.5% |
| n | 477 | 6.1% |
| o | 463 | 5.9% |
| u | 429 | 5.5% |
| b | 372 | 4.8% |
| i | 361 | 4.6% |
| z | 315 | 4.0% |
| Other values (32) | 2043 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 7782 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| a | 1679 | |
| M | 629 | 8.1% |
| s | 508 | 6.5% |
| m | 506 | 6.5% |
| n | 477 | 6.1% |
| o | 463 | 5.9% |
| u | 429 | 5.5% |
| b | 372 | 4.8% |
| i | 361 | 4.6% |
| z | 315 | 4.0% |
| Other values (32) | 2043 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 7782 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| a | 1679 | |
| M | 629 | 8.1% |
| s | 508 | 6.5% |
| m | 506 | 6.5% |
| n | 477 | 6.1% |
| o | 463 | 5.9% |
| u | 429 | 5.5% |
| b | 372 | 4.8% |
| i | 361 | 4.6% |
| z | 315 | 4.0% |
| Other values (32) | 2043 |
issue
Categorical
| Distinct | 49 |
|---|---|
| Distinct (%) | 4.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 61.6 KiB |
| Child labor | |
|---|---|
| Child Labor | |
| Forced child marriage | |
| Emotional abuse | |
| Child Marriage | |
| Other values (44) |
Length
| Max length | 51 |
|---|---|
| Median length | 11 |
| Mean length | 14.058058 |
| Min length | 7 |
Unique
| Unique | 26 ? |
|---|---|
| Unique (%) | 2.6% |
Sample
| 1st row | Child Labor |
|---|---|
| 2nd row | Forced child labor |
| 3rd row | Child labor |
| 4th row | Child labor |
| 5th row | Child Labor |
Common Values
| Value | Count | Frequency (%) |
| Child labor | 315 | |
| Child Labor | 261 | |
| Forced child marriage | 90 | 9.0% |
| Emotional abuse | 85 | 8.5% |
| Child Marriage | 48 | 4.8% |
| Forced child labor | 43 | 4.3% |
| Child marriage | 32 | 3.2% |
| Forced Child Marriage | 24 | 2.4% |
| Neglect | 19 | 1.9% |
| Forced Child Labor | 15 | 1.5% |
| Other values (39) | 67 | 6.7% |
Length
| Value | Count | Frequency (%) |
| child | 861 | |
| labor | 642 | |
| marriage | 220 | 9.7% |
| forced | 189 | 8.3% |
| abuse | 127 | 5.6% |
| emotional | 116 | 5.1% |
| neglect | 30 | 1.3% |
| and | 21 | 0.9% |
| physical | 15 | 0.7% |
| of | 10 | 0.4% |
| Other values (13) | 37 | 1.6% |
Most occurring characters
| Value | Count | Frequency (%) |
| l | 1407 | |
| a | 1378 | |
| r | 1275 | |
| 1269 | ||
| i | 1240 | |
| o | 1098 | 7.8% |
| d | 1083 | 7.7% |
| h | 878 | 6.3% |
| b | 771 | 5.5% |
| C | 721 | 5.1% |
| Other values (25) | 2924 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 14044 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| l | 1407 | |
| a | 1378 | |
| r | 1275 | |
| 1269 | ||
| i | 1240 | |
| o | 1098 | 7.8% |
| d | 1083 | 7.7% |
| h | 878 | 6.3% |
| b | 771 | 5.5% |
| C | 721 | 5.1% |
| Other values (25) | 2924 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 14044 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| l | 1407 | |
| a | 1378 | |
| r | 1275 | |
| 1269 | ||
| i | 1240 | |
| o | 1098 | 7.8% |
| d | 1083 | 7.7% |
| h | 878 | 6.3% |
| b | 771 | 5.5% |
| C | 721 | 5.1% |
| Other values (25) | 2924 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 14044 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| l | 1407 | |
| a | 1378 | |
| r | 1275 | |
| 1269 | ||
| i | 1240 | |
| o | 1098 | 7.8% |
| d | 1083 | 7.7% |
| h | 878 | 6.3% |
| b | 771 | 5.5% |
| C | 721 | 5.1% |
| Other values (25) | 2924 |
category
Text
| Distinct | 102 |
|---|---|
| Distinct (%) | 10.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 64.2 KiB |
Length
| Max length | 47 |
|---|---|
| Median length | 45 |
| Mean length | 16.707708 |
| Min length | 5 |
Unique
| Unique | 43 ? |
|---|---|
| Unique (%) | 4.3% |
Sample
| 1st row | Child Exploitation |
|---|---|
| 2nd row | Child Labor |
| 3rd row | Labor exploitation |
| 4th row | Child exploitation |
| 5th row | Work Exploitation |
| Value | Count | Frequency (%) |
| child | 524 | |
| labor | 441 | |
| exploitation | 386 | |
| protection | 191 | 8.9% |
| abuse | 106 | 5.0% |
| marriage | 102 | 4.8% |
| forced | 81 | 3.8% |
| emotional | 53 | 2.5% |
| workplace | 40 | 1.9% |
| violation | 29 | 1.4% |
| Other values (31) | 186 | 8.7% |
Most occurring characters
| Value | Count | Frequency (%) |
| o | 1944 | |
| i | 1825 | 10.9% |
| a | 1314 | 7.9% |
| t | 1300 | 7.8% |
| l | 1180 | 7.1% |
| 1140 | 6.8% | |
| r | 1042 | 6.2% |
| e | 827 | 5.0% |
| n | 695 | 4.2% |
| d | 622 | 3.7% |
| Other values (35) | 4802 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 16691 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| o | 1944 | |
| i | 1825 | 10.9% |
| a | 1314 | 7.9% |
| t | 1300 | 7.8% |
| l | 1180 | 7.1% |
| 1140 | 6.8% | |
| r | 1042 | 6.2% |
| e | 827 | 5.0% |
| n | 695 | 4.2% |
| d | 622 | 3.7% |
| Other values (35) | 4802 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 16691 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| o | 1944 | |
| i | 1825 | 10.9% |
| a | 1314 | 7.9% |
| t | 1300 | 7.8% |
| l | 1180 | 7.1% |
| 1140 | 6.8% | |
| r | 1042 | 6.2% |
| e | 827 | 5.0% |
| n | 695 | 4.2% |
| d | 622 | 3.7% |
| Other values (35) | 4802 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 16691 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| o | 1944 | |
| i | 1825 | 10.9% |
| a | 1314 | 7.9% |
| t | 1300 | 7.8% |
| l | 1180 | 7.1% |
| 1140 | 6.8% | |
| r | 1042 | 6.2% |
| e | 827 | 5.0% |
| n | 695 | 4.2% |
| d | 622 | 3.7% |
| Other values (35) | 4802 |
priority
Categorical
Imbalance
| Distinct | 49 |
|---|---|
| Distinct (%) | 4.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 54.3 KiB |
| High | |
|---|---|
| Urgent | |
| High urgency | 12 |
| High (urgent action required) | 10 |
| Medium | 7 |
| Other values (44) | 63 |
Length
| Max length | 107 |
|---|---|
| Median length | 4 |
| Mean length | 6.4934935 |
| Min length | 4 |
Unique
| Unique | 35 ? |
|---|---|
| Unique (%) | 3.5% |
Sample
| 1st row | Urgent |
|---|---|
| 2nd row | High |
| 3rd row | High |
| 4th row | High |
| 5th row | High |
Common Values
| Value | Count | Frequency (%) |
| High | 802 | |
| Urgent | 105 | 10.5% |
| High urgency | 12 | 1.2% |
| High (urgent action required) | 10 | 1.0% |
| Medium | 7 | 0.7% |
| Moderate | 5 | 0.5% |
| High Priority | 5 | 0.5% |
| High (Urgent action required) | 5 | 0.5% |
| High (Immediate Action Required) | 3 | 0.3% |
| High (Urgent) | 2 | 0.2% |
| Other values (39) | 43 | 4.3% |
Length
| Value | Count | Frequency (%) |
| high | 872 | |
| urgent | 136 | 10.5% |
| action | 31 | 2.4% |
| required | 27 | 2.1% |
| urgency | 18 | 1.4% |
| immediate | 15 | 1.2% |
| the | 12 | 0.9% |
| needed | 11 | 0.8% |
| moderate | 10 | 0.8% |
| to | 10 | 0.8% |
| Other values (63) | 153 | 11.8% |
Most occurring characters
| Value | Count | Frequency (%) |
| i | 1055 | |
| g | 1050 | |
| h | 915 | |
| H | 870 | |
| e | 391 | 6.0% |
| t | 297 | 4.6% |
| 296 | 4.6% | |
| n | 272 | 4.2% |
| r | 261 | 4.0% |
| U | 119 | 1.8% |
| Other values (32) | 961 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 6487 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| i | 1055 | |
| g | 1050 | |
| h | 915 | |
| H | 870 | |
| e | 391 | 6.0% |
| t | 297 | 4.6% |
| 296 | 4.6% | |
| n | 272 | 4.2% |
| r | 261 | 4.0% |
| U | 119 | 1.8% |
| Other values (32) | 961 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 6487 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| i | 1055 | |
| g | 1050 | |
| h | 915 | |
| H | 870 | |
| e | 391 | 6.0% |
| t | 297 | 4.6% |
| 296 | 4.6% | |
| n | 272 | 4.2% |
| r | 261 | 4.0% |
| U | 119 | 1.8% |
| Other values (32) | 961 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 6487 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| i | 1055 | |
| g | 1050 | |
| h | 915 | |
| H | 870 | |
| e | 391 | 6.0% |
| t | 297 | 4.6% |
| 296 | 4.6% | |
| n | 272 | 4.2% |
| r | 261 | 4.0% |
| U | 119 | 1.8% |
| Other values (32) | 961 |
has_missing_fields
Boolean
Constant
| Distinct | 1 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.1 KiB |
| False |
|---|
| Value | Count | Frequency (%) |
| False | 999 |
has_transcript_issues
Boolean
Constant
| Distinct | 1 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.1 KiB |
| False |
|---|
| Value | Count | Frequency (%) |
| False | 999 |
has_summary_issues
Boolean
Constant
| Distinct | 1 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.1 KiB |
| False |
|---|
| Value | Count | Frequency (%) |
| False | 999 |
has_consistency_issues
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.1 KiB |
| True | |
|---|---|
| False |
| Value | Count | Frequency (%) |
| True | 611 | |
| False | 388 |
victim_info
Text
| Distinct | 269 |
|---|---|
| Distinct (%) | 26.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 68.5 KiB |
Length
| Max length | 88 |
|---|---|
| Median length | 70 |
| Mean length | 21.062062 |
| Min length | 5 |
Unique
| Unique | 190 ? |
|---|---|
| Unique (%) | 19.0% |
Sample
| 1st row | 5-year-old girl |
|---|---|
| 2nd row | 13-year-old sister |
| 3rd row | Sammy (14 years old) |
| 4th row | 12-year-old brother |
| 5th row | 13-year-old sister |
| Value | Count | Frequency (%) |
| sister | 403 | |
| 12-year-old | 367 | 13.4% |
| girl | 208 | 7.6% |
| niece | 165 | 6.0% |
| 13-year-old | 142 | 5.2% |
| 5-year-old | 124 | 4.5% |
| 14-year-old | 113 | 4.1% |
| 7-year-old | 101 | 3.7% |
| daughter | 94 | 3.4% |
| a | 90 | 3.3% |
| Other values (185) | 927 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 2190 | 10.4% |
| r | 2015 | 9.6% |
| - | 1849 | 8.8% |
| 1735 | 8.2% | |
| o | 1405 | 6.7% |
| l | 1286 | 6.1% |
| a | 1246 | 5.9% |
| d | 1196 | 5.7% |
| s | 1103 | 5.2% |
| i | 1059 | 5.0% |
| Other values (44) | 5957 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 21041 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 2190 | 10.4% |
| r | 2015 | 9.6% |
| - | 1849 | 8.8% |
| 1735 | 8.2% | |
| o | 1405 | 6.7% |
| l | 1286 | 6.1% |
| a | 1246 | 5.9% |
| d | 1196 | 5.7% |
| s | 1103 | 5.2% |
| i | 1059 | 5.0% |
| Other values (44) | 5957 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 21041 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 2190 | 10.4% |
| r | 2015 | 9.6% |
| - | 1849 | 8.8% |
| 1735 | 8.2% | |
| o | 1405 | 6.7% |
| l | 1286 | 6.1% |
| a | 1246 | 5.9% |
| d | 1196 | 5.7% |
| s | 1103 | 5.2% |
| i | 1059 | 5.0% |
| Other values (44) | 5957 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 21041 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 2190 | 10.4% |
| r | 2015 | 9.6% |
| - | 1849 | 8.8% |
| 1735 | 8.2% | |
| o | 1405 | 6.7% |
| l | 1286 | 6.1% |
| a | 1246 | 5.9% |
| d | 1196 | 5.7% |
| s | 1103 | 5.2% |
| i | 1059 | 5.0% |
| Other values (44) | 5957 |
perpetrator_info
Text
| Distinct | 442 |
|---|---|
| Distinct (%) | 44.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 69.0 KiB |
Length
| Max length | 120 |
|---|---|
| Median length | 69 |
| Mean length | 21.560561 |
| Min length | 4 |
Unique
| Unique | 360 ? |
|---|---|
| Unique (%) | 36.0% |
Sample
| 1st row | Unknown, but a local factory is suspected |
|---|---|
| 2nd row | Local factory owners |
| 3rd row | Unknown (employer at a local factory) |
| 4th row | Unspecified local workshop owner or workers |
| 5th row | Unnamed factory owners |
| Value | Count | Frequency (%) |
| factory | 305 | 9.4% |
| the | 200 | 6.2% |
| local | 188 | 5.8% |
| unknown | 138 | 4.2% |
| neighbor | 118 | 3.6% |
| owner | 114 | 3.5% |
| family | 112 | 3.4% |
| owners | 108 | 3.3% |
| not | 107 | 3.3% |
| mother | 100 | 3.1% |
| Other values (251) | 1758 |
Most occurring characters
| Value | Count | Frequency (%) |
| 2249 | 10.4% | |
| e | 2084 | 9.7% |
| o | 1836 | 8.5% |
| r | 1512 | 7.0% |
| a | 1324 | 6.1% |
| n | 1313 | 6.1% |
| t | 1196 | 5.6% |
| i | 1065 | 4.9% |
| s | 865 | 4.0% |
| c | 814 | 3.8% |
| Other values (45) | 7281 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 21539 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 2249 | 10.4% | |
| e | 2084 | 9.7% |
| o | 1836 | 8.5% |
| r | 1512 | 7.0% |
| a | 1324 | 6.1% |
| n | 1313 | 6.1% |
| t | 1196 | 5.6% |
| i | 1065 | 4.9% |
| s | 865 | 4.0% |
| c | 814 | 3.8% |
| Other values (45) | 7281 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 21539 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 2249 | 10.4% | |
| e | 2084 | 9.7% |
| o | 1836 | 8.5% |
| r | 1512 | 7.0% |
| a | 1324 | 6.1% |
| n | 1313 | 6.1% |
| t | 1196 | 5.6% |
| i | 1065 | 4.9% |
| s | 865 | 4.0% |
| c | 814 | 3.8% |
| Other values (45) | 7281 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 21539 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 2249 | 10.4% | |
| e | 2084 | 9.7% |
| o | 1836 | 8.5% |
| r | 1512 | 7.0% |
| a | 1324 | 6.1% |
| n | 1313 | 6.1% |
| t | 1196 | 5.6% |
| i | 1065 | 4.9% |
| s | 865 | 4.0% |
| c | 814 | 3.8% |
| Other values (45) | 7281 |
referral_info
Text
| Distinct | 549 |
|---|---|
| Distinct (%) | 55.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 89.1 KiB |
Length
| Max length | 97 |
|---|---|
| Median length | 73 |
| Mean length | 41.943944 |
| Min length | 10 |
Unique
| Unique | 440 ? |
|---|---|
| Unique (%) | 44.0% |
Sample
| 1st row | Mombasa Child Welfare Society and the police |
|---|---|
| 2nd row | ["Nairobi Children's Office, Relevant labor authorities"] |
| 3rd row | ['Mombasa Labor Commission, Police'] |
| 4th row | Kisumu Child Protection Unit and police |
| 5th row | ["Kisumu Children's Office", 'local police'] |
| Value | Count | Frequency (%) |
| police | 909 | |
| office | 796 | |
| children's | 614 | |
| and | 456 | 8.5% |
| local | 305 | 5.7% |
| labor | 283 | 5.2% |
| mombasa | 257 | 4.8% |
| child | 229 | 4.2% |
| mwanza | 222 | 4.1% |
| kisumu | 153 | 2.8% |
| Other values (127) | 1168 |
Most occurring characters
| Value | Count | Frequency (%) |
| 4393 | 10.5% | |
| i | 3529 | 8.4% |
| e | 3250 | 7.8% |
| a | 2611 | 6.2% |
| l | 2547 | 6.1% |
| o | 2546 | 6.1% |
| c | 2366 | 5.6% |
| ' | 1842 | 4.4% |
| n | 1835 | 4.4% |
| f | 1688 | 4.0% |
| Other values (49) | 15295 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 41902 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 4393 | 10.5% | |
| i | 3529 | 8.4% |
| e | 3250 | 7.8% |
| a | 2611 | 6.2% |
| l | 2547 | 6.1% |
| o | 2546 | 6.1% |
| c | 2366 | 5.6% |
| ' | 1842 | 4.4% |
| n | 1835 | 4.4% |
| f | 1688 | 4.0% |
| Other values (49) | 15295 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 41902 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 4393 | 10.5% | |
| i | 3529 | 8.4% |
| e | 3250 | 7.8% |
| a | 2611 | 6.2% |
| l | 2547 | 6.1% |
| o | 2546 | 6.1% |
| c | 2366 | 5.6% |
| ' | 1842 | 4.4% |
| n | 1835 | 4.4% |
| f | 1688 | 4.0% |
| Other values (49) | 15295 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 41902 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 4393 | 10.5% | |
| i | 3529 | 8.4% |
| e | 3250 | 7.8% |
| a | 2611 | 6.2% |
| l | 2547 | 6.1% |
| o | 2546 | 6.1% |
| c | 2366 | 5.6% |
| ' | 1842 | 4.4% |
| n | 1835 | 4.4% |
| f | 1688 | 4.0% |
| Other values (49) | 15295 |
intervention_info
Text
| Distinct | 639 |
|---|---|
| Distinct (%) | 64.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 100.6 KiB |
Length
| Max length | 182 |
|---|---|
| Median length | 119 |
| Mean length | 53.996997 |
| Min length | 18 |
Unique
| Unique | 559 ? |
|---|---|
| Unique (%) | 56.0% |
Sample
| 1st row | Immediate action required: report to child welfare society and police |
|---|---|
| 2nd row | Report to Nairobi Children's Office and relevant labor authorities |
| 3rd row | Reporting to authorities and follow-ups |
| 4th row | Report to authorities, follow-up |
| 5th row | Investigation, rescue, rehabilitation and reintegration of the victim, legal action against perpetrators |
| Value | Count | Frequency (%) |
| to | 856 | 10.9% |
| and | 731 | 9.3% |
| report | 701 | 9.0% |
| authorities | 648 | 8.3% |
| the | 515 | 6.6% |
| follow-up | 352 | 4.5% |
| immediate | 256 | 3.3% |
| up | 202 | 2.6% |
| follow | 198 | 2.5% |
| support | 188 | 2.4% |
| Other values (248) | 3172 |
Most occurring characters
| Value | Count | Frequency (%) |
| 6820 | ||
| t | 5392 | 10.0% |
| o | 5299 | 9.8% |
| e | 5063 | 9.4% |
| i | 3692 | 6.8% |
| r | 3380 | 6.3% |
| a | 3044 | 5.6% |
| p | 2486 | 4.6% |
| l | 2387 | 4.4% |
| n | 2163 | 4.0% |
| Other values (45) | 14217 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 53943 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 6820 | ||
| t | 5392 | 10.0% |
| o | 5299 | 9.8% |
| e | 5063 | 9.4% |
| i | 3692 | 6.8% |
| r | 3380 | 6.3% |
| a | 3044 | 5.6% |
| p | 2486 | 4.6% |
| l | 2387 | 4.4% |
| n | 2163 | 4.0% |
| Other values (45) | 14217 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 53943 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 6820 | ||
| t | 5392 | 10.0% |
| o | 5299 | 9.8% |
| e | 5063 | 9.4% |
| i | 3692 | 6.8% |
| r | 3380 | 6.3% |
| a | 3044 | 5.6% |
| p | 2486 | 4.6% |
| l | 2387 | 4.4% |
| n | 2163 | 4.0% |
| Other values (45) | 14217 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 53943 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 6820 | ||
| t | 5392 | 10.0% |
| o | 5299 | 9.8% |
| e | 5063 | 9.4% |
| i | 3692 | 6.8% |
| r | 3380 | 6.3% |
| a | 3044 | 5.6% |
| p | 2486 | 4.6% |
| l | 2387 | 4.4% |
| n | 2163 | 4.0% |
| Other values (45) | 14217 |
Interactions
Correlations
| compression_ratio | has_consistency_issues | issue | line_number | priority | summary_length | transcript_length | word_count_summary | word_count_transcript | |
|---|---|---|---|---|---|---|---|---|---|
| compression_ratio | 1.000 | 0.061 | 0.000 | -0.022 | 0.000 | 0.894 | -0.263 | 0.865 | -0.248 |
| has_consistency_issues | 0.061 | 1.000 | 0.090 | 0.000 | 0.088 | 0.020 | 0.090 | 0.077 | 0.169 |
| issue | 0.000 | 0.090 | 1.000 | 0.029 | 0.048 | 0.000 | 0.123 | 0.061 | 0.107 |
| line_number | -0.022 | 0.000 | 0.029 | 1.000 | 0.000 | -0.033 | -0.022 | -0.042 | -0.027 |
| priority | 0.000 | 0.088 | 0.048 | 0.000 | 1.000 | 0.000 | 0.093 | 0.000 | 0.057 |
| summary_length | 0.894 | 0.020 | 0.000 | -0.033 | 0.000 | 1.000 | 0.139 | 0.965 | 0.131 |
| transcript_length | -0.263 | 0.090 | 0.123 | -0.022 | 0.093 | 0.139 | 1.000 | 0.134 | 0.934 |
| word_count_summary | 0.865 | 0.077 | 0.061 | -0.042 | 0.000 | 0.965 | 0.134 | 1.000 | 0.140 |
| word_count_transcript | -0.248 | 0.169 | 0.107 | -0.027 | 0.057 | 0.131 | 0.934 | 0.140 | 1.000 |
Missing values
Sample
| record_id | line_number | transcript_length | summary_length | compression_ratio | word_count_transcript | word_count_summary | name | location | issue | category | priority | has_missing_fields | has_transcript_issues | has_summary_issues | has_consistency_issues | victim_info | perpetrator_info | referral_info | intervention_info | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | line_1 | 1 | 663 | 180 | 0.271493 | 123 | 29 | Ahmed | Mombasa | Child Labor | Child Exploitation | Urgent | False | False | False | False | 5-year-old girl | Unknown, but a local factory is suspected | Mombasa Child Welfare Society and the police | Immediate action required: report to child welfare society and police |
| 1 | line_2 | 2 | 612 | 265 | 0.433007 | 106 | 42 | Daniel | Kisumu | Forced child labor | Child Labor | High | False | False | False | True | 13-year-old sister | Local factory owners | ["Nairobi Children's Office, Relevant labor authorities"] | Report to Nairobi Children's Office and relevant labor authorities |
| 2 | line_3 | 3 | 623 | 348 | 0.558587 | 120 | 59 | Daniel | Mombasa | Child labor | Labor exploitation | High | False | False | False | False | Sammy (14 years old) | Unknown (employer at a local factory) | ['Mombasa Labor Commission, Police'] | Reporting to authorities and follow-ups |
| 3 | line_4 | 4 | 603 | 238 | 0.394693 | 108 | 40 | Jane | Kisumu | Child labor | Child exploitation | High | False | False | False | True | 12-year-old brother | Unspecified local workshop owner or workers | Kisumu Child Protection Unit and police | Report to authorities, follow-up |
| 4 | line_5 | 5 | 685 | 176 | 0.256934 | 127 | 28 | John | Narok | Child Labor | Work Exploitation | High | False | False | False | False | 13-year-old sister | Unnamed factory owners | ["Kisumu Children's Office", 'local police'] | Investigation, rescue, rehabilitation and reintegration of the victim, legal action against perpetrators |
| 5 | line_6 | 6 | 606 | 264 | 0.435644 | 107 | 43 | John | Mwanza | Forced child marriage | Child protection | Urgent | False | False | False | False | 12-year-old niece | Family members | ["Mwanza Children's Office", 'police'] | Report to the authorities and seek further assistance |
| 6 | line_7 | 7 | 594 | 285 | 0.479798 | 101 | 45 | Sam | Busia | Child marriage and emotional abuse | Child marriage, Emotional abuse | High (urgent intervention needed) | False | False | False | True | 14-year-old sister | husband | Busia Children's Office, Police | Follow-up calls with the caller |
| 7 | line_8 | 8 | 577 | 270 | 0.467938 | 107 | 44 | Sam | Mwanza | Child labor | Labor exploitation | High | False | False | False | True | 7-year-old sister | Local factory | Mwanza Child Welfare Services and police | Immediate reporting to authorities followed by follow-up |
| 8 | line_9 | 9 | 661 | 229 | 0.346445 | 119 | 36 | James | Kisumu, Kenya | Emotional abuse | Child Protection | High | False | False | False | True | 12-year-old girl | Mother | ["Kisumu Children's Office", 'Local police'] | Report to authorities and follow-up with Childline |
| 9 | line_10 | 10 | 614 | 282 | 0.459283 | 110 | 47 | John | Kisumu | Forced child marriage | Child protection | High | False | False | False | False | 14-year-old daughter of a neighbor | Not specified | ['Kisumu Child Protection Unit', 'local police'] | Report to authorities and follow-up by helpline |
| record_id | line_number | transcript_length | summary_length | compression_ratio | word_count_transcript | word_count_summary | name | location | issue | category | priority | has_missing_fields | has_transcript_issues | has_summary_issues | has_consistency_issues | victim_info | perpetrator_info | referral_info | intervention_info | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 989 | line_991 | 991 | 661 | 321 | 0.485628 | 112 | 50 | David | Mombasa | Child labor | Labor Exploitation | High | False | False | False | True | A 14-year-old girl | The victim's neighbor | The Children's Office and local police | Immediate report and action by authorities |
| 990 | line_992 | 992 | 682 | 193 | 0.282991 | 122 | 31 | Hassan | Mwanza | Child Labor | Child Labor | High | False | False | False | True | 5-year-old sister | Unnamed family members at a local factory | Mwanza Labor Office and police | Report to authorities and offer support |
| 991 | line_993 | 993 | 659 | 284 | 0.430956 | 116 | 42 | Sam | Kisumu | Child Labor | Labor Exploitation | High | False | False | False | True | 12-year-old sister | Local factory owners | ["Kisumu Children's Office", 'Police'] | Report the issue to the appropriate authorities |
| 992 | line_994 | 994 | 572 | 244 | 0.426573 | 99 | 40 | James | Kisumu, Kenya | Child marriage | Child protection issues | High | False | False | False | True | Around 13-year-old girl | The child's family (not specified) | Children's Office and local police | Immediate action: Report the case to the Children's Office and local police |
| 993 | line_995 | 995 | 601 | 234 | 0.389351 | 108 | 37 | John | Mwanza | Child labor | Child exploitation | High | False | False | False | True | 13-year-old sister | Local factory owner | ["Mwanza Children's Office", 'Labor inspectorate'] | Report to local authorities and follow-up |
| 994 | line_996 | 996 | 644 | 343 | 0.532609 | 111 | 54 | Isaac | Mwanza, Tanzania | Child labor | Child abuse and exploitation | High | False | False | False | True | 5-year-old daughter of Isaac's sister | Isaac's sister | ['Mwanza Child Welfare Office', 'local police'] | Report to authorities, follow-up support |
| 995 | line_997 | 997 | 610 | 240 | 0.393443 | 107 | 40 | Jamal | Kisumu | Emotional abuse | Child protection | High | False | False | False | True | 12-year-old sister | Stepmother | ["Kisumu Children's Office", 'Police'] | Report to authorities and offer support |
| 996 | line_998 | 998 | 577 | 178 | 0.308492 | 111 | 31 | James | Nakuru | Child Labor | Labor Exploitation | High | False | False | False | False | 5-year-old sister | Local shop owner | ['Nakuru Children’s Office, Police'] | Immediate reporting to authorities and follow-up |
| 997 | line_999 | 999 | 658 | 244 | 0.370821 | 113 | 33 | Moses | Tororo | Child Labor | Labor Exploitation | High | False | False | False | True | 5-year-old sister | Not specified | Tororo Children’s Office, police, and local authorities | Immediate reporting and ongoing support |
| 998 | line_1000 | 1000 | 546 | 158 | 0.289377 | 100 | 25 | John | Mwanza | Child Labor | Child Protection | High | False | False | False | True | 12-year-old nephew | Factory Owner | Mwanza Labor Office and the police | Immediate reporting to authorities, follow-up, and potential rescue operation |